-
Notifications
You must be signed in to change notification settings - Fork 919
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve contains_column
by invoking contains_table
#14238
Improve contains_column
by invoking contains_table
#14238
Conversation
Performance comparison between before and after this PR shows that the new implementation can bring up to 2.2x speedups over the old for large data and can be 15% slower than the current code in the worst scenario: Benchmark Time CPU Time Old Time New CPU Old CPU New
--------------------------------------------------------------------------------------------------------------------------------------------------
Search/ColumnContains_AllValid/1024/manual_time -0.0201 -0.0147 0 0 0 0
Search/ColumnContains_AllValid/4096/manual_time +0.1479 +0.1027 0 0 0 0
Search/ColumnContains_AllValid/32768/manual_time +0.3774 +0.2879 0 0 0 0
Search/ColumnContains_AllValid/262144/manual_time +0.1825 +0.1631 0 0 0 0
Search/ColumnContains_AllValid/2097152/manual_time -0.4995 -0.4977 4 2 4 2
Search/ColumnContains_AllValid/16777216/manual_time -0.4629 -0.4627 42 23 42 23
Search/ColumnContains_AllValid/67108864/manual_time -0.5701 -0.5700 82 35 82 35
Search/ColumnContains_Nulls/1024/manual_time +0.0138 +0.0081 0 0 0 0
Search/ColumnContains_Nulls/4096/manual_time +0.0807 +0.0570 0 0 0 0
Search/ColumnContains_Nulls/32768/manual_time +0.4188 +0.3202 0 0 0 0
Search/ColumnContains_Nulls/262144/manual_time +0.4772 +0.4089 0 0 0 0
Search/ColumnContains_Nulls/2097152/manual_time -0.4655 -0.4635 4 2 4 2
Search/ColumnContains_Nulls/16777216/manual_time -0.4141 -0.4139 38 22 38 22
Search/ColumnContains_Nulls/67108864/manual_time -0.5455 -0.5453 77 35 77 35
OVERALL_GEOMEAN -0.1737 -0.1887 0 0 0 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've wanted this change for a long time! Thanks @PointKernel. The benchmarks look good for large sizes, despite the minor slowdown for medium sizes.
/merge |
Description
Part of ##12261
This PR simplifies the
contains_column
implementation by invokingcontains_table
and gets rid of the use of the cudfunordered_multiset
. It also removes theunordered_multiset
header file from libcudf.Checklist